Extracting, Linking and Integrating Data from Public Sources: A Financial Case Study
نویسندگان
چکیده
We present Midas, a system that uses complex data processing to extract and aggregate facts from a large collection of structured and unstructured documents into a set of unified, clean entities and relationships. Midas focuses on data for financial companies and is based on periodic filings with the U.S. Securities and Exchange Commission (SEC) and Federal Deposit Insurance Corporation (FDIC). We show that, by using data aggregated by Midas, we can provide valuable insights about financial institutions either at the whole system level or at the individual company level. The key technology components that we implemented in Midas and that enable the various financial applications are: information extraction, entity resolution, mapping and fusion, all on top of a scalable infrastructure based on Hadoop. We describe our experience in building the Midas system and also outline the key research questions that remain to be addressed towards building a generic, high-level infrastructure for large-scale data integration from public sources.
منابع مشابه
Integrating the Population Perspective into Health System Performance Assessment (IPHA): Study Protocol for a Cross-Sectional Study in Germany Linking Survey and Claims Data of Statutorily and Privately Insured
Background Health system performance assessment (HSPA) is a major tool for evidence-based governance in health systems and patient/population-orientation is increasingly considered as an important aspect. The IPHA study aims (1) to undertake a comprehensive performance assessment of the German health system from a population perspec...
متن کاملDesigning a new multi-objective fuzzy stochastic DEA model in a dynamic environment to estimate efficiency of decision making units (Case Study: An Iranian Petroleum Company)
This paper presents a new multi-objective fuzzy stochastic data envelopment analysis model (MOFS-DEA) under mean chance constraints and common weights to estimate the efficiency of decision making units for future financial periods of them. In the initial MOFS-DEA model, the outputs and inputs are characterized by random triangular fuzzy variables with normal distribution, in which ...
متن کاملApplication of Big Data Analytics in Power Distribution Network
Smart grid enhances optimization in generation, distribution and consumption of the electricity by integrating information and communication technologies into the grid. Today, utilities are moving towards smart grid applications, most common one being deployment of smart meters in advanced metering infrastructure, and the first technical challenge they face is the huge volume of data generated ...
متن کاملAssessment of Public Hospital Governance in Romania: Lessons From 10 Case Studies
Background The Government of Romania commissioned international technical assistance to help unpacking the causes of arrears in selected public hospitals. Emphases were placed on the governance-related determinants of the hospital performance in the context of the Romanian health system. Methods The assessment was structured around a public hospital governance framewor...
متن کاملEfficiency Evaluation by using mixed modeling of Data Envelopment Analysis and Balanced Scorecard- A Case Study in the banking industry
The first objective in any financial organization is to improve performance, and performance evaluation also is one of the best ways to advance operations in organizations. By utilizing different methods of performance evaluation, organizations can evaluate the effectiveness and efficiency of processes that are in accord with strategic objectives. In addition, the performance evaluation instrum...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- IEEE Data Eng. Bull.
دوره 34 شماره
صفحات -
تاریخ انتشار 2011